Evaluation of Odor Prediction Model Performance and Variable Importance according to Various Missing Imputation Methods

نویسندگان

چکیده

The aim of this study is to ascertain the most suitable model for predicting complex odors using odor substance data that has a small number and large missing data. First, we compared removal imputation methods, method imputing was found be more effective. Then, in order recommend model, created total 126 models (missing imputation: single imputation, multiple imputations, K-nearest neighbor imputation; preprocessing: standardization, principal component analysis, partial least square; predictive method: regression, machine learning, deep learning) them R2 mean absolute error (MAE) values. Finally, investigated variable importance best prediction model. results identified as combination multivariate Bayesian ridge method, standardization preprocessing, an extremely randomized tree method. Among compounds, Methyl mercaptan, acetic acid, dimethyl sulfide were important compounds odors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance evaluation of different estimation methods for missing rainfall data

There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...

متن کامل

Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables

We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and...

متن کامل

Performance Evaluation of L1-norm-based Microarray Missing Value Imputation

l1-norm minimization was utilized in the imputation of microarray missing values, which is an important procedure in bioinformatics experiments. Two l1 approaches, based on the framework of local least squares (LLS) and iterative biclusterbased least squares (bicluster-iLLS) respectively, were employed. Imputed datasets of the l1 approaches were compared with those of traditional l2 methods. Th...

متن کامل

Performance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model

BACKGROUND It is challenging to deal with mixture models when missing values occur in clustering datasets. METHODS AND RESULTS We propose a dynamic clustering algorithm based on a multivariate Gaussian mixture model that efficiently imputes missing values to generate a "pseudo-complete" dataset. Parameters from different clusters and missing values are estimated according to the maximum likel...

متن کامل

Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

ÐMissing data are often encountered in data sets used to construct effort prediction models. Thus far, the common practice has been to ignore observations with missing data. This may result in biased prediction models. In this paper, we evaluate four missing data techniques (MDTs) in the context of software cost modeling: listwise deletion (LD), mean imputation (MI), similar response pattern im...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app12062826